Skip to content
This repository has been archived by the owner on Oct 20, 2024. It is now read-only.

feat(huff_lexer): lexer using the logos crate #118

Draft
wants to merge 17 commits into
base: huff_parser
Choose a base branch
from

Conversation

nvnx7
Copy link
Collaborator

@nvnx7 nvnx7 commented Jun 23, 2022

Lexer written using the logos with tweaks/benefits:

  • A lot of complexity is abstracted thanks to logos.
  • All of tokens are now lexed properly incl. opcodes, hex/string literals etc. Able to lex entire erc20.huff without errors.
  • Opcodes/evm types based on context e.g. address is both a evm primitive type and opcode, but it'll be lexed as opcode only when used within a macro body, as type only when inside param list of functions/events and as ident otherwise. Same goes with EVM primitive types and array types.
  • Eof and Whitespace are removed. Whitespaces are simply ignored and EOF can be simply inferred when lexer iterator gives out None value.
  • Lexer's iterator's assoc. type is now simply Token instead of a Result. So, you just do token.unwrap() instead of token.unwrap().unwrap().
  • A notable change is that TokenKind enum variants hold data (if any) of type &'a str only avoiding mixed types - making it more memory efficient because of the way enums behave in rust. Necessary conversion is done at parser or wherever required.

lmk any suggestions :)

@devtooligan devtooligan marked this pull request as draft February 20, 2023 19:00
@devtooligan
Copy link
Contributor

this has gotten really old now. changing to draft.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants